I haven't needed SQL this ugly before
December 19, 2010 4:07 AM Subscribe
What SQL magic do I need to turn one column into several?
I need to print some tickets, each of which has enough room to hold one set of customer details and up to five items ordered by that customer. Customers who have ordered more than 5 items get multiple tickets. So from an orders table like this,
I need to print some tickets, each of which has enough room to hold one set of customer details and up to five items ordered by that customer. Customers who have ordered more than 5 items get multiple tickets. So from an orders table like this,
Customer | Item ---------|------ Bob | FTMCH Bob | ZORP Bob | KLUGE Carol | FTMCH Carol | MEEP Carol | ZORP Ted | FOON Ted | SMOCK Alice | ORGO Carol | SQICK Carol | BLECH Carol | KLUGE Carol | GLURPI need a query that returns this:
Customer | Item1 | Item2 | Item3 | Item4 | Item5 ---------|-------|-------|-------|-------|------ Alice | ORGO | null | null | null | null Bob | FTMCH | ZORP | KLUGE | null | null Carol | FTMCH | MEEP | ZORP | SQICK | BLECH Carol | KLUGE | GLURP | null | null | null Ted | FOON | SMOCK | null | null | nullCan some kind soul help me with the SQL for this? HSQL embedded database in OpenOffice.org Base, if it makes a difference.
Response by poster:
posted by flabdablet at 5:09 AM on December 19, 2010
DECODESo it looks to me like DECODE in SQL is kind of like a switch or case statement in procedural languages; but I'm too dim to see its immediate application here. What have I missed?
DECODE( value expr main, value expr match 1, value expr result 1 [...,] [, value expr default] )
DECODE takes at least 3 arguments. The value expr main is compared with value expr match 1 and if it matches, value expr result 1 is returned. If there are additional pairs of value expr match n and value expr result n, comparison is repeated until a match is found the result is returned. If no match is found, the value expr default is returned if it is specified, otherwise NULL is returned. The type of the return value is a combination of the types of the value expr result ... arguments. (HyperSQL)
posted by flabdablet at 5:09 AM on December 19, 2010
Response by poster: Done. But I bet I get better answers quicker from AskMe.
posted by flabdablet at 6:06 AM on December 19, 2010
posted by flabdablet at 6:06 AM on December 19, 2010
On my phone so I can't do the whole thing, will check back when I'm home, but
You can probably do it with a subquery that adds a rank or rownumber column, then 5 case statements on a modulo of your new column and a group by.
Shame hsql doesn't have a pivot command like the newer mssql.
posted by ish__ at 7:11 AM on December 19, 2010
You can probably do it with a subquery that adds a rank or rownumber column, then 5 case statements on a modulo of your new column and a group by.
Shame hsql doesn't have a pivot command like the newer mssql.
posted by ish__ at 7:11 AM on December 19, 2010
Best answer: Preamble: none of this is tested, I just woke up, I've never used hsql only MSSql.
So, looks like theres no rank/rownum in hsql - but you can fake it with something like http://support.microsoft.com/kb/18613 (adapt to your table/hsql):
Using that, get a subquery with your customer and item column and a new rank column in it - ordering the rows by customer asc, item asc. Hereafter this is "sq".
Let me know if this works, doesn't work, doesn't make sense or if you have a followup and I'm happy to try to work with you til it does.
posted by ish__ at 7:30 AM on December 19, 2010
So, looks like theres no rank/rownum in hsql - but you can fake it with something like http://support.microsoft.com/kb/18613 (adapt to your table/hsql):
select rank=count(*), a1.au_lname, a1.au_fname from authors a1, authors a2 where a1.au_lname + a1.au_fname >= a2.au_lname + a2.au_fname group by a1.au_lname, a1.au_fname order by rank
Using that, get a subquery with your customer and item column and a new rank column in it - ordering the rows by customer asc, item asc. Hereafter this is "sq".
select sq.customer, case when mod(rank,5) = 0 then max(item) end, case when mod(rank,5) = 1 then max(item) end, case when mod(rank,5) = 2 then max(item) end, case when mod(rank,5) = 3 then max(item) end, case when mod(rank,5) = 4 then max(item) end from sq group by sq.customer, divide_integers(rank,5)I don't know what sort of fanciness you have to do in hsql to divide integers and get an integer results. In MSSql its a pain in the ass so we often have a function for it to cast them both to floats then drop the decimal part and convert back to integer. I assume you can duplicate that in hsql somehow.
Let me know if this works, doesn't work, doesn't make sense or if you have a followup and I'm happy to try to work with you til it does.
posted by ish__ at 7:30 AM on December 19, 2010
What you are trying to do looks like a crosstabulation, though you will have to add an artificial column like ish__ says - try googling for that.
Another approach would be to use GROUP_CONCAT to gather all the item codes for each customer, and then select again with a splitting function to get the tickets into different columns. I don't know how you would limit the grouping to five items per customer though, it looks like your dataset needs something like an item_number column to do this in any meaningful way.
I have to ask though, why do you want to transform your dataset like that? If it's a matter of printing the tickets, can't you do it with a grouped report with the customer info in the group header?
posted by Dr Dracator at 7:30 AM on December 19, 2010 [1 favorite]
Another approach would be to use GROUP_CONCAT to gather all the item codes for each customer, and then select again with a splitting function to get the tickets into different columns. I don't know how you would limit the grouping to five items per customer though, it looks like your dataset needs something like an item_number column to do this in any meaningful way.
I have to ask though, why do you want to transform your dataset like that? If it's a matter of printing the tickets, can't you do it with a grouped report with the customer info in the group header?
posted by Dr Dracator at 7:30 AM on December 19, 2010 [1 favorite]
Response by poster: ish__, that or something very like it looks like it will work. It's way past my bedtime but I will play with it in the morning and let you know. Thanks!
Dr Dracator, I'm replacing a back end, the existing front end expects to be fed a CSV file formatted like the query result above, and I'm under moderate time pressure for a usable first cut. In the fullness of time I will indeed be replacing the front end as well, at which point grouped reports are probably indeed the Right Thing.
posted by flabdablet at 7:51 AM on December 19, 2010
Dr Dracator, I'm replacing a back end, the existing front end expects to be fed a CSV file formatted like the query result above, and I'm under moderate time pressure for a usable first cut. In the fullness of time I will indeed be replacing the front end as well, at which point grouped reports are probably indeed the Right Thing.
posted by flabdablet at 7:51 AM on December 19, 2010
A similar question (not sql, though) was asked two weeks ago here.
posted by cnanderson at 8:04 AM on December 19, 2010
posted by cnanderson at 8:04 AM on December 19, 2010
I just looked at the answer on stack overflow, and your response. Is there a way to do what that guy said by first running a query that results in this:
posted by gjc at 9:39 AM on December 19, 2010
Customer | Item | Count ---------|--------|------ Alice | ORGO | 1 Bob | FTMCH | 1 Bob | ZORP | 2 Bob | KLUGE | 3 Carol | FTMCH | 1 Carol | MEEP | 2 Carol | ZORP | 3 Carol | SQICK | 4 Carol | BLECH | 5 Carol | KLUGE | 6 Carol | GLURP | 7 Ted | FOON | 1 Ted | SMOCK | 2
posted by gjc at 9:39 AM on December 19, 2010
Response by poster: OK, this works well enough:
posted by flabdablet at 7:37 AM on December 20, 2010
SELECT "Customer", MAX(CASE WHEN "Slot" = 0 THEN "Item" END) AS "Item1", MAX(CASE WHEN "Slot" = 1 THEN "Item" END) AS "Item2", MAX(CASE WHEN "Slot" = 2 THEN "Item" END) AS "Item3", MAX(CASE WHEN "Slot" = 3 THEN "Item" END) AS "Item4", MAX(CASE WHEN "Slot" = 4 THEN "Item" END) AS "Item5" FROM ( SELECT l."Customer" AS "Customer", l."Item" AS "Item", COUNT(r."Item") / 5 AS "Ticket", MOD(COUNT(r."Item"), 5) AS "Slot" FROM "Orders" AS l LEFT JOIN "Orders" AS r ON r."Customer" = l."Customer" AND r."Item" < l."Item" GROUP BY "Customer", "Item" ) GROUP BY "Customer", "Ticket" ORDER BY "Customer", "Ticket"It makes this:
Customer | Item1 | Item2 | Item3 | Item4 | Item5 ---------|-------|-------|-------|-------|------- Alice | ORGO | | | | Bob | FTMCH | KLUGE | ZORP | | Carol | BLECH | FTMCH | GLURP | KLUGE | MEEP Carol | SQICK | ZORP | | | Ted | FOON | SMOCK | | |Thanks to all who helped, both here and at Stack Overflow.
posted by flabdablet at 7:37 AM on December 20, 2010
Response by poster: Jesus, this just gets worse :-(
Turns out the business rules allow the same customer to order the same item on multiple occasions, and that all outstanding orders are to be included on the one set of tickets. So my toy table should have looked more like this:
Spiritually, I'm an embedded-systems guy, not a database guy. Can anybody who does this for a living tell me whether this kind of nonsense is common? Would a query with four nested SELECTs and a LEFT JOIN merit a mention on the Daily WTF?
posted by flabdablet at 12:31 AM on December 21, 2010
Turns out the business rules allow the same customer to order the same item on multiple occasions, and that all outstanding orders are to be included on the one set of tickets. So my toy table should have looked more like this:
ID | Customer | Item 159 | Bob | FTMCH 264 | Bob | ZORP 265 | Bob | KLUGE 288 | Carol | FTMCH 314 | Carol | MEEP 323 | Carol | ZORP 327 | Ted | FOON 338 | Ted | SMOCK 358 | Alice | ORGO 419 | Carol | SQICK 716 | Carol | MEEP 846 | Carol | BLECH 939 | Carol | MEEP 950 | Carol | GLURP 979 | Carol | KLUGECarol's multiple MEEPs bugger the ranking logic in the original solution, and I've ended up with the following hideous monster:
SELECT "Customer", MAX(CASE WHEN "Slot" = 0 THEN "Item" END) AS "Item0", MAX(CASE WHEN "Slot" = 1 THEN "Item" END) AS "Item1", MAX(CASE WHEN "Slot" = 2 THEN "Item" END) AS "Item2", MAX(CASE WHEN "Slot" = 3 THEN "Item" END) AS "Item3", MAX(CASE WHEN "Slot" = 4 THEN "Item" END) AS "Item4", MAX(CASE WHEN "Slot" = 0 THEN "Quantity" END) AS "Qty0", MAX(CASE WHEN "Slot" = 1 THEN "Quantity" END) AS "Qty1", MAX(CASE WHEN "Slot" = 2 THEN "Quantity" END) AS "Qty2", MAX(CASE WHEN "Slot" = 3 THEN "Quantity" END) AS "Qty3", MAX(CASE WHEN "Slot" = 4 THEN "Quantity" END) AS "Qty4" FROM ( SELECT "Customer", "Item", COUNT("ID") AS "Quantity", "Rank" / 5 AS "Ticket", MOD("Rank", 5) AS "Slot" FROM ( SELECT main."ID" AS "ID", main."Customer" AS "Customer", main."Item" AS "Item", COUNT(less."Item") AS "Rank" FROM "Orders" AS main LEFT JOIN ( SELECT DISTINCT "Customer", "Item" FROM "Orders") AS less ON less."Customer" = main."Customer" AND less."Item" < main."Item" GROUP BY "Customer", "Item", "ID" ) GROUP BY "Customer", "Item", "Rank" ) GROUP BY "Customer", "Ticket"which makes this:
Customer | Item0 | Item1 | Item2 | Item3 | Item | Qty0 | Qty1 | Qty2 | Qty3 | Qty3 | Qty4 Bob | FTMCH | KLUGE | ZORP | | | 1 | 1 | 1 | 0 | 0 | 0 Carol | BLECH | FTMCH | GLURP | KLUGE | MEEP | 1 | 1 | 1 | 1 | 1 | 3 Carol | SQICK | ZORP | | | | 1 | 1 | 0 | 0 | 0 | 0 Ted | FOON | SMOCK | | | | 1 | 1 | 0 | 0 | 0 | 0 Alice | ORGO | | | | | 1 | 0 | 0 | 0 | 0 | 0It does the job, I guess, but I'm feeling pretty lucky that the database involved is always going to be quite small (a few thousand rows).
Spiritually, I'm an embedded-systems guy, not a database guy. Can anybody who does this for a living tell me whether this kind of nonsense is common? Would a query with four nested SELECTs and a LEFT JOIN merit a mention on the Daily WTF?
posted by flabdablet at 12:31 AM on December 21, 2010
Not really doing this for a living, but I can tell you I have seen much, much worse in the wild.
posted by Dr Dracator at 5:34 AM on December 21, 2010
posted by Dr Dracator at 5:34 AM on December 21, 2010
flabadablet: the question in Dr Dracator's first response, the "why are you doing it like this at all? shouldn't your front-end do formatting stuff?" I ask that *every single day* at work. I've seen guys write a 10 page select statement with grouping 6 or 7 levels deep and have it crash the SQL server and wonder why? Much, much, MUCH worse exists in the wild for companies you wouldn't even think about.
posted by ish__ at 2:12 PM on December 21, 2010
posted by ish__ at 2:12 PM on December 21, 2010
This thread is closed to new comments.
posted by Confess, Fletch at 4:42 AM on December 19, 2010